In the TCGA cohort we found 11 variants of interest, of which 9 variants can be validated by RNASeq data.
The samples of interest are those where the variant has been called either in DNA or RNA, and have RNASeq data available for validation. The splice junction search is made based on the STAR-SJCounts 1-based intronic positions.
Splicing alterations to be evaluated:
Variant found in 5 patients of the TCGA (5 samples).
The splicing alterations being assessed are:
Variant information:
Load the extracted splice junctions of the gene harboring the mutation.
extractedSJ_path <- paste0(extractedSJ_dir_in,"NRAS_UM_annotSJ.tsv")
GeneSJ <- read.delim(extractedSJ_path, sep ="\t")
Set the sample’s group: Mutated (MUT) or No Mutated (WT)
samples_df <- found_variants[found_variants$Gene=="NRAS" & found_variants$MutationKey_Hg38 == "chr1,114716123,C,T",]
cases <- samples_df$RNA_Sample[samples_df$Validable == "Validable"]
GeneSJ$GROUP <- ifelse(GeneSJ$sample_id %in% cases , "MUT", "WT")
Search for the splice junctions of interest in the extracted splice junctions of the gene by position (chr_SJstart_SJend).
Search: predicted at 4 bp from the variant
Show all the splice junctions containing the position 114716126
colnames(GeneSJ)[grep("114716126",colnames(GeneSJ))]
## [1] "chr1_114709722_114716126"
Found: chr1_114709722_114716126
Reads of all the AML samples (mutated and no mutated) for the splice junction:
GeneSJ$chr1_114709722_114716126
## [1] 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [149] 0 0
Samples with the SJ of interest:
table(GeneSJ$chr1_114709722_114716126>0)
##
## FALSE TRUE
## 149 1
Groups of the samples having the alternative splice junction:
table(GeneSJ$GROUP[GeneSJ$chr1_114709722_114716126 > 0])
##
## WT
## 1
Alternative SJ not found in the mutated samples.
Search: chr1:114713979-114716657
Show all the splice junctions containing the position 114713979
colnames(GeneSJ)[grep("114713979",colnames(GeneSJ))]
## [1] "chr1_114713979_114716049" "chr1_114713979_114716657"
Found: chr1:114713979-114716657
Reads of all the AML samples (mutated and no mutated) for the splice junction:
GeneSJ$chr1_114713979_114716657
## [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [38] 0 7 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0
## [112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0
## [149] 0 0
Samples with the SJ of interest:
table(GeneSJ$chr1_114713979_114716657>0)
##
## FALSE TRUE
## 139 11
Groups of the samples having the alternative splice junction:
table(GeneSJ$GROUP[GeneSJ$chr1_114713979_114716657 > 0])
##
## MUT WT
## 1 10
Alternative SJ found in the mutated samples.
Exon1-2: chr1:114716178-114716657
Exon2-3: chr1:114713979-114716049
Reads of all the AML samples (mutated and no mutated) for the splice junction:
GeneSJ$chr1_114716178_114716657
## [1] 16 28 33 19 19 20 15 3 10 25 39 15 25 13 19 40 18 16 11 9 27 21 22 3 23
## [26] 5 13 18 23 10 18 12 22 9 24 55 13 13 23 17 27 6 6 8 11 8 23 29 25 17
## [51] 5 10 9 9 25 10 21 20 30 22 10 13 13 9 12 2 19 15 1 16 13 17 25 15 26
## [76] 26 18 15 15 12 4 10 32 17 12 17 33 18 14 12 31 24 15 30 21 22 28 9 15 51
## [101] 20 11 27 4 19 31 28 23 16 24 10 12 14 21 16 22 35 17 1 15 20 8 9 6 33
## [126] 14 20 11 12 9 17 12 16 31 31 16 15 10 14 7 9 18 43 29 17 8 31 19 0 54
Reads of all the AML samples (mutated and no mutated) for the splice junction:
GeneSJ$chr1_114713979_114716049
## [1] 67 55 99 98 44 89 41 46 57 30 110 77 109 19 51 89 86 85
## [19] 66 27 69 52 96 58 82 24 29 78 76 48 85 85 68 56 104 129
## [37] 31 73 62 42 88 28 22 11 72 61 70 137 26 52 31 86 27 77
## [55] 93 74 38 46 73 101 50 36 50 27 61 19 102 43 11 77 85 89
## [73] 102 60 74 54 94 45 78 59 30 58 127 75 41 53 87 45 65 42
## [91] 121 86 48 81 93 146 127 54 49 76 78 25 114 42 73 74 67 74
## [109] 53 96 62 38 44 112 37 87 60 60 0 62 79 22 80 31 153 78
## [127] 82 42 41 37 43 77 65 114 73 75 32 66 94 39 40 85 66 93
## [145] 72 38 73 99 10 124
Count the reads of all the splice junctions of the gene harboring the variant:
GeneSJ$rowSum_SJtotal <- rowSums(GeneSJ[,grep("chr", names(GeneSJ))])
Normalization of the expression by the total read counts of all the splice junctions of the gene:
GeneSJ$Normalized_CanonEx1_2 <- (GeneSJ$chr1_114716178_114716657)/GeneSJ$rowSum_SJtotal*100
GeneSJ$Normalized_CanonEx2_3 <- (GeneSJ$chr1_114713979_114716049)/GeneSJ$rowSum_SJtotal*100
GeneSJ$Normalized_ES2 <- (GeneSJ$chr1_114713979_114716657)/GeneSJ$rowSum_SJtotal*100
Download the normalized values for the assessed splice junctions of all the AML samples:
Mutated samples vaf:
Canonical SJ:
Splicing alterations:
Canonical SJ:
Splicing alteration:
Violin Plots for the alternative splice junctions interrogated:
SJCounts <- GeneSJ #TCGA.NRAS.chr1-114716123-C-T.xlsx
Normality Test:
shapiro.test(SJCounts$Normalized_ES2[SJCounts$GROUP == "WT"])
##
## Shapiro-Wilk normality test
##
## data: SJCounts$Normalized_ES2[SJCounts$GROUP == "WT"]
## W = 0.22159, p-value < 2.2e-16
Value of Mean Normalized Expression of the Alternative SJ in WT samples:
mean_WT_SJi <- mean(SJCounts$Normalized_ES2[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 0.0287364
Normalized Expression Value of the Alternative SJ in the MUT sample:
MUT_SJi <- SJCounts$Normalized_ES2[SJCounts$GROUP == "MUT"]
MUT_SJi
## [1] 0.0000000 0.0000000 0.0000000 0.2040816
Deviation from the mean normalized expression:
SJCounts$Difference <- SJCounts$Normalized_ES2 - mean_WT_SJi
Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression
SJCounts$Difference[SJCounts$GROUP == "MUT"]
## [1] -0.0287364 -0.0287364 -0.0287364 0.1753452
v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
## x[1:11] = -0.028736, 0.11985, 0.12949, ..., 0.58476, 1.258
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))
v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.9315068 0.9315068 0.9315068 0.9520548
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_ES2")]
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")
MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <- NA
MUT_df$Prediction <- "Exon Skipping"
MUT_df$splice_junction_status <- "AlternativeSJ found in MUT samples"
MUT_df$splice_junction_position <- "chr1:114713979-114716657"
MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)
Download the vaf, inferred percentiles and pvalues of the mutated samples:
Normality:
shapiro.test(SJCounts$Normalized_ES2[SJCounts$GROUP== "WT"])
##
## Shapiro-Wilk normality test
##
## data: SJCounts$Normalized_ES2[SJCounts$GROUP == "WT"]
## W = 0.22159, p-value < 2.2e-16
shapiro.test(SJCounts$Normalized_ES2[SJCounts$GROUP== "MUT"])
##
## Shapiro-Wilk normality test
##
## data: SJCounts$Normalized_ES2[SJCounts$GROUP == "MUT"]
## W = 0.62978, p-value = 0.001241
Mann Whitney:
wt <- wilcox.test(x=SJCounts$Normalized_ES2[SJCounts$GROUP== "MUT"],
y=SJCounts$Normalized_ES2[SJCounts$GROUP== "WT"],
alternative = "two.sided",
paired = FALSE,
conf.int = 0.95)
wt
##
## Wilcoxon rank sum test with continuity correction
##
## data: SJCounts$Normalized_ES2[SJCounts$GROUP == "MUT"] and SJCounts$Normalized_ES2[SJCounts$GROUP == "WT"]
## W = 343, p-value = 0.1924
## alternative hypothesis: true location shift is not equal to 0
## 95 percent confidence interval:
## -4.015703e-05 2.965998e-05
## sample estimates:
## difference in location
## 1.138317e-05
Variant found in 2 patients of the TCGA (2 samples).
The splicing alterations being assessed are:
Variant information:
Load the extracted splice junctions of the gene harboring the mutation.
extractedSJ_path <- paste0(extractedSJ_dir_in,"KRAS_UM_annotSJ.tsv")
GeneSJ <- read.delim(extractedSJ_path, sep ="\t")
Set the sample’s group: Mutated (MUT) or No Mutated (WT)
samples_df <- found_variants[found_variants$Gene=="KRAS" & found_variants$MutationKey_Hg38 == "chr12,25245347,C,T",]
cases <- samples_df$RNA_Sample[samples_df$Validable == "Validable"]
GeneSJ$GROUP <- ifelse(GeneSJ$sample_id %in% cases , "MUT", "WT")
Search for the splice junctions of interest in the extracted splice junctions of the gene by position (chr_SJstart_SJend).
Search: predicted a 1bp from the variant, chr12:25245345
Show all the splice junctions containing the positions between 25245340 - 25245349
colnames(GeneSJ)[grep("2524534",colnames(GeneSJ))]
## character(0)
Alternative SJ not found in the splice junction collection.
Mutated samples vaf:
Variant found in 1 patient of the TCGA (1 sample).
The splicing alterations being assessed are:
Variant information:
Load the extracted splice junctions of the gene harboring the mutation.
extractedSJ_path <- paste0(extractedSJ_dir_in,"KMT2D_UM_annotSJ.tsv")
GeneSJ <- read.delim(extractedSJ_path, sep ="\t")
Set the sample’s group: Mutated (MUT) or No Mutated (WT)
samples_df <- found_variants[found_variants$Gene=="KMT2D" & found_variants$MutationKey_Hg38 == "chr12,49022063,G,A",]
cases <- samples_df$RNA_Sample[samples_df$Validable == "Validable"]
GeneSJ$GROUP <- ifelse(GeneSJ$sample_id %in% cases , "MUT", "WT")
Search for the splice junctions of interest in the extracted splice junctions of the gene by position (chr_SJstart_SJend).
Search: predicted at 49022065
Show all the splice junctions containing the positions from 49022060 to 49022069
colnames(GeneSJ)[grepl("4902206", colnames(GeneSJ))]
## character(0)
Alternative SJ not found in the splice junction collection.
Show all the splice junctions containing the positions from 49022070 to 49022079
colnames(GeneSJ)[grepl("4902207", colnames(GeneSJ))]
## character(0)
Alternative SJ not found in the splice junction collection.
Mutated samples vaf:
Variant found in 2 patients of the TCGA (2 samples).
The splicing alterations being assessed are:
Variant information:
Load the extracted splice junctions of the gene harboring the mutation.
extractedSJ_path <- paste0(extractedSJ_dir_in,"FLT3_UM_annotSJ.tsv")
GeneSJ <- read.delim(extractedSJ_path, sep ="\t")
Set the sample’s group: Mutated (MUT) or No Mutated (WT)
samples_df <- found_variants[found_variants$Gene=="FLT3" & found_variants$MutationKey_Hg38 == "chr13,28018485,G,T",]
cases <- samples_df$RNA_Sample[samples_df$Validable == "Validable"]
GeneSJ$GROUP <- ifelse(GeneSJ$sample_id %in% cases , "MUT", "WT")
Search for the splice junctions of interest in the extracted splice junctions of the gene by position (chr_SJstart_SJend).
Search: chr13:28015702-28023349
Show all the splice junctions containing the position 28015702_28023349
colnames(GeneSJ)[grep("28015702_28023349",colnames(GeneSJ))]
## [1] "chr13_28015702_28023349"
Found: chr13:28015702-28023349
Reads of all the AML samples (mutated and no mutated) for the splice junction:
GeneSJ$chr13_28015702_28023349
## [1] 2 12 0 42 23 11 3 0 59 0 0 16 0 0 11 0 13 18 0 1 15 2 9 16 8
## [26] 0 10 10 35 30 4 8 5 8 9 0 11 4 22 16 9 2 16 3 9 5 2 7 1 10
## [51] 9 5 17 20 5 0 10 3 13 39 18 55 2 4 4 16 8 14 10 14 2 12 11 2 9
## [76] 3 12 29 4 8 3 8 5 15 22 2 5 0 0 2 13 9 6 33 19 0 5 0 14 25
## [101] 0 4 12 5 6 11 16 17 20 3 0 29 6 4 0 26 10 3 3 7 0 5 16 6 23
## [126] 2 11 0 24 12 17 12 2 4 13 13 17 7 11 27 12 9 3 17 12 0 5 0 6 0
Samples with the SJ of interest:
table(GeneSJ$chr13_28015702_28023349>0)
##
## FALSE TRUE
## 23 127
Groups of the samples having the alternative splice junction:
table(GeneSJ$GROUP[GeneSJ$chr13_28015702_28023349 > 0])
##
## MUT WT
## 2 125
Alternative SJ found in the mutated samples.
Search: chr13:28018590-28024860
Show all the splice junctions containing the positions 28018590-28024860
colnames(GeneSJ)[grep("28018590_28024860",colnames(GeneSJ))]
## [1] "chr13_28018590_28024860"
Found: chr13:28018590-28024860
Reads of all the AML samples (mutated and no mutated) for the splice junction:
GeneSJ$chr13_28018590_28024860
## [1] 0 1 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [26] 0 0 2 0 1 0 0 0 0 0 0 0 2 1 0 0 0 3 0 0 0 0 0 1 0
## [51] 0 1 3 2 1 0 0 6 0 0 3 15 0 0 1 0 0 0 0 4 0 2 1 0 0
## [76] 1 0 0 0 0 0 3 0 0 0 1 0 0 0 0 5 2 1 1 0 0 1 0 2 0
## [101] 0 0 0 0 2 0 0 2 0 0 0 1 0 5 0 1 0 0 0 0 0 0 0 2 0
## [126] 0 0 1 1 0 0 0 2 0 1 2 2 0 0 1 0 0 0 0 0 0 0 0 0 0
Samples with the SJ of interest:
table(GeneSJ$chr13_28018590_28024860>0)
##
## FALSE TRUE
## 109 41
Groups of the samples having the alternative splice junction:
table(GeneSJ$GROUP[GeneSJ$chr13_28018590_28024860 > 0])
##
## WT
## 41
Alternative SJ found in the mutated samples.
Search: chr13:28015702-28024860
Show all the splice junctions containing the positions 28015702-28024860
colnames(GeneSJ)[grep("28015702_28024860",colnames(GeneSJ))]
## [1] "chr13_28015702_28024860"
Reads of all the AML samples (mutated and no mutated) for the splice junction:
GeneSJ$chr13_28015702_28024860
## [1] 0 1 0 3 0 0 0 0 7 0 0 0 0 0 0 0 0 1 0 0 2 0 0 1 0
## [26] 0 0 0 3 1 0 0 0 0 0 0 0 0 5 1 2 0 0 0 0 0 0 0 0 0
## [51] 2 0 1 0 0 0 1 0 0 4 0 19 0 0 0 8 0 0 6 1 0 0 0 0 0
## [76] 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
## [101] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 0 0 0 0 0 0 0 0 0
## [126] 0 0 0 0 0 0 2 0 0 0 2 5 0 0 1 0 0 0 0 0 0 0 0 0 0
Samples with the SJ of interest:
table(GeneSJ$chr13_28015702_28024860 >0)
##
## FALSE TRUE
## 124 26
Groups of the samples having the alternative splice junction:
table(GeneSJ$GROUP[GeneSJ$chr13_28015702_28024860 > 0])
##
## MUT WT
## 1 25
Alternative SJ found in the mutated samples.
Exon 19-20 chr13:28018590-28023349
Exon 20-21 chr13:28015702-28018466; splice site chr13:28018466
Reads of all the AML samples (mutated and no mutated) for the splice junction:
GeneSJ$chr13_28018590_28023349
## [1] 249 515 30 590 516 602 66 9 466 125 64 282 229 30 328 89 254 370
## [19] 131 207 569 143 286 248 238 131 82 112 692 491 75 269 162 338 397 11
## [37] 268 239 451 751 302 48 306 59 233 171 73 434 225 347 293 166 494 548
## [55] 186 47 134 275 527 544 335 415 66 66 315 197 307 347 209 239 143 195
## [73] 337 70 341 194 449 349 80 134 105 461 359 360 487 342 163 53 47 22
## [91] 447 200 121 870 399 26 206 35 316 847 31 74 374 350 274 263 636 443
## [109] 314 575 91 575 227 279 109 681 419 164 100 224 93 187 429 192 912 99
## [127] 185 173 223 498 202 678 170 257 242 188 220 345 113 194 285 273 248 518
## [145] 335 65 144 102 271 3
Reads of all the AML samples (mutated and no mutated) for the splice junction:
GeneSJ$chr13_28015702_28018466
## [1] 248 498 29 616 468 622 108 14 539 164 71 291 235 33 303
## [16] 99 270 395 126 266 573 156 276 338 263 181 116 142 660 510
## [31] 78 283 199 299 405 6 331 263 453 772 333 48 336 49 224
## [46] 202 62 375 235 302 324 185 508 536 177 54 160 365 501 528
## [61] 314 476 106 73 298 312 315 422 199 182 114 204 310 70 428
## [76] 214 482 507 96 201 95 471 342 386 541 360 164 54 64 22
## [91] 501 241 123 844 374 33 244 52 347 838 36 69 407 330 296
## [106] 322 589 441 319 567 83 592 253 313 116 691 428 171 181 241
## [121] 112 244 439 212 1000 92 190 191 216 499 217 710 168 270 256
## [136] 251 226 349 103 241 332 270 251 557 332 74 133 121 331 2
Count the reads of all the splice junctions of the gene harboring the variant:
GeneSJ$rowSum_SJtotal <- rowSums(GeneSJ[,grep("chr", names(GeneSJ))])
Normalization of the expression by the total read counts of all the splice junctions of the gene:
GeneSJ$Normalized_CanonEx19_20 <- (GeneSJ$chr13_28018590_28023349)/GeneSJ$rowSum_SJtotal*100
GeneSJ$Normalized_CanonEx20_21 <- (GeneSJ$chr13_28015702_28018466)/GeneSJ$rowSum_SJtotal*100
GeneSJ$Normalized_SE20 <- (GeneSJ$chr13_28015702_28023349)/GeneSJ$rowSum_SJtotal*100
GeneSJ$INCLUSION_Ex20 <- GeneSJ$chr13_28018590_28023349 + GeneSJ$chr13_28015702_28018466
GeneSJ$PSI_SE20 <- (GeneSJ$INCLUSION_Ex20)/(GeneSJ$chr13_28015702_28023349+GeneSJ$INCLUSION_Ex20)
GeneSJ$Normalized_SE1920 <- (GeneSJ$chr13_28015702_28024860)/GeneSJ$rowSum_SJtotal*100
Download the normalized values for the assessed splice junctions of all the AML samples:
Mutated samples vaf:
Canonical splice junction: Exon 20-21 chr13:28015702-28018466; donor splice site chr13:28018466
Splicing alterations:
Canonical splice junction: Exon 20-21 chr13:28015702-28018466; donor splice site chr13:28018466
Splicing alterations:
Violin Plots for the alternative splice junctions interrogated:
SJCounts <- GeneSJ
Normality Test:
shapiro.test(SJCounts$Normalized_CanonEx20_21[SJCounts$GROUP == "WT"])
##
## Shapiro-Wilk normality test
##
## data: SJCounts$Normalized_CanonEx20_21[SJCounts$GROUP == "WT"]
## W = 0.87793, p-value = 1.08e-09
Value of Mean Normalized Expression of the Alternative SJ in WT samples:
mean_WT_SJi <- mean(SJCounts$Normalized_CanonEx20_21[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 4.634878
Normalized Expression Value of the Alternative SJ in the MUT sample:
MUT_SJi <- SJCounts$Normalized_CanonEx20_21[SJCounts$GROUP == "MUT"]
MUT_SJi
## [1] 4.701734 3.881225
Deviation from the mean normalized expression:
SJCounts$Difference <- SJCounts$Normalized_CanonEx20_21 - mean_WT_SJi
Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression
SJCounts$Difference[SJCounts$GROUP == "MUT"]
## [1] 0.06685609 -0.75365251
v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
## x[1:148] = -1.9682, -1.4945, -1.4716, ..., 3.2724, 4.9872
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))
v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.6216216 0.1689189
print(paste0("MUT Percentile: ", v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])))
## [1] "MUT Percentile: 0.621621621621622" "MUT Percentile: 0.168918918918919"
print(paste0("Inferred Pvalue: ", v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])))
## [1] "Inferred Pvalue: 0.621621621621622" "Inferred Pvalue: 0.168918918918919"
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_CanonEx20_21")]
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")
MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <- MUT_df$ECDF
MUT_df$Prediction <- "Donor Loss"
MUT_df$splice_junction_status <- "CanonicalSJ"
MUT_df$splice_junction_position <- "chr13:28015702-28018466"
MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)
MUT_df$Pvalue <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$Pvalue)
Download the vaf, inferred percentiles and pvalues of the mutated samples:
Normality Test:
shapiro.test(SJCounts$Normalized_SE20[SJCounts$GROUP == "WT"])
##
## Shapiro-Wilk normality test
##
## data: SJCounts$Normalized_SE20[SJCounts$GROUP == "WT"]
## W = 0.85842, p-value = 1.298e-10
Value of Mean Normalized Expression of the SJ in WT samples:
mean_WT_SJi <- mean(SJCounts$Normalized_SE20[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 0.1460614
Normalized Expression Value of the SJ in the MUT sample:
MUT_SJi <- SJCounts$Normalized_SE20[SJCounts$GROUP == "MUT"]
MUT_SJi
## [1] 0.2938584 0.1402852
Deviation from the mean normalized expression:
SJCounts$Difference <- SJCounts$Normalized_SE20 - mean_WT_SJi
Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression
SJCounts$Difference[SJCounts$GROUP == "MUT"]
## [1] 0.147796993 -0.005776121
v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
## x[1:126] = -0.14606, -0.12376, -0.11948, ..., 0.37583, 0.73747
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))
v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.9121622 0.5675676
print(paste0("MUT Percentile: ", v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])))
## [1] "MUT Percentile: 0.912162162162162" "MUT Percentile: 0.567567567567568"
1-v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.08783784 0.43243243
print(paste0("Inferred Pvalue: ", 1-v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])))
## [1] "Inferred Pvalue: 0.0878378378378378" "Inferred Pvalue: 0.432432432432432"
Download the vaf, inferred percentiles and pvalues of the mutated samples:
Normality Test:
shapiro.test(SJCounts$Normalized_SE1920[SJCounts$GROUP == "WT"])
##
## Shapiro-Wilk normality test
##
## data: SJCounts$Normalized_SE1920[SJCounts$GROUP == "WT"]
## W = 0.26624, p-value < 2.2e-16
Value of Mean Normalized Expression of the SJ in WT samples:
mean_WT_SJi <- mean(SJCounts$Normalized_SE1920[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 0.007997
Value of Mean Normalized Expression of the SJ in WT samples:
MUT_SJi <- SJCounts$Normalized_SE1920[SJCounts$GROUP == "MUT"]
MUT_SJi
## [1] 0.02938584 0.00000000
Deviation from the mean normalized expression:
SJCounts$Difference <- SJCounts$Normalized_SE1920 - mean_WT_SJi
Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression
SJCounts$Difference[SJCounts$GROUP == "MUT"]
## [1] 0.02138884 -0.00799700
v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
## x[1:26] = -0.007997, -0.0025172, -0.00058025, ..., 0.15099, 0.29722
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))
v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.9256757 0.8310811
print(paste0("MUT Percentile: ", v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])))
## [1] "MUT Percentile: 0.925675675675676" "MUT Percentile: 0.831081081081081"
1-v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.07432432 0.16891892
print(paste0("Inferred Pvalue: ", 1-v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])))
## [1] "Inferred Pvalue: 0.0743243243243243" "Inferred Pvalue: 0.168918918918919"
v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.9256757 0.8310811
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_SE1920")]
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")
MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <- 1- MUT_df$ECDF
MUT_df$Prediction <- "Exon Skipping"
MUT_df$splice_junction_status <- "AlternativeSJ found in MUT samples"
MUT_df$splice_junction_position <- "chr13:28015702-28024860"
MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)
MUT_df$Pvalue <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$Pvalue)
Download the vaf, inferred percentiles and pvalues of the mutated samples:
## Add not found effects information:
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id")]
MUT_df$Prediction <- "Exon Skipping"
MUT_df$splice_junction_status <- "AlternativeSJ not found in MUT samples"
MUT_df$splice_junction_position <- "chr13:28018590-28024860"
MUT_df$ECDF <- NA
MUT_df$Pvalue <- NA
MUT_df$NormalizedExpression <- NA
vaf_select <- annot_vaf[,c("Gene","MutationKey_Hg38","HGVSc", "HGVSp","sample_id","RNA_Sample","DNA_Sample","MuTect2_vaf","VarScan2_vaf","SomaticSniper_vaf","MuSE_vaf", "VAF_RNA")]
MUT_vaf <- merge(MUT_df, vaf_select, by.x="sample_id", by.y = "RNA_Sample")
colnames(MUT_vaf)[colnames(MUT_vaf) == "sample_id"] <- "RNA_Sample"
colnames(MUT_vaf)[colnames(MUT_vaf) == "sample_id.y"] <- "sample_id"
col_order <- c("Gene", "MutationKey_Hg38","HGVSc" ,"HGVSp" ,"Prediction","splice_junction_status","splice_junction_position","sample_id", "case_id","RNA_Sample","DNA_Sample","NormalizedExpression","ECDF", "Pvalue", "VAF_RNA","MuTect2_vaf","VarScan2_vaf","SomaticSniper_vaf","MuSE_vaf")
MUT_vaf <- MUT_vaf[, col_order]
Three (3) variants: chr2,208248389,G,A, chr2,208248389,G,T & chr2,208248389,G,C
Variants found in 16 patients of the TCGA (16 samples)
Patients with the variant and RNASeq for validation: 11 patients (11 samples)
The splicing alterations being assessed are:
Variant information:
Load the extracted splice junctions of the gene harboring the mutation.
extractedSJ_path <- paste0(extractedSJ_dir_in,"IDH1_UM_annotSJ.tsv")
GeneSJ <- read.delim(extractedSJ_path, sep ="\t")
Set the sample’s group: Mutated (MUT) or No Mutated (WT)
samples_df <- found_variants[found_variants$Gene=="IDH1" & found_variants$MutationKey_Hg38 %in% c( "chr2,208248389,G,A","chr2,208248389,G,T" ,"chr2,208248389,G,C"),]
R132C <- samples_df$RNA_Sample[samples_df$MutationKey_Hg38 == "chr2,208248389,G,A" & samples_df$Validable =="Validable"] #n=9 G>A
R132G <- samples_df$RNA_Sample[samples_df$MutationKey_Hg38 == "chr2,208248389,G,C" & samples_df$Validable =="Validable"] #n=1 G>C
R132S <- samples_df$RNA_Sample[samples_df$MutationKey_Hg38 == "chr2,208248389,G,T" & samples_df$Validable =="Validable"] #n=1 G>T
cases <- append(R132C, R132G)
cases <- append(cases, R132S)
GeneSJ$GROUP <- ifelse(GeneSJ$sample_id %in% R132C, "MUT",
ifelse(GeneSJ$sample_id %in% R132G, "MUT",
ifelse(GeneSJ$sample_id %in% R132S, "MUT",
"WT")))
GeneSJ$Variant_status <- ifelse(GeneSJ$sample_id %in% R132C, "R132C",
ifelse(GeneSJ$sample_id %in% R132G, "R132G",
ifelse(GeneSJ$sample_id %in% R132S, "R132S",
"WT")))
Search for the splice junctions of interest in the extracted splice junctions of the gene by position (chr_SJstart_SJend).
Search: chr2:208245425-208248422
Show all the splice junctions containing the position 208245425-208248422
colnames(GeneSJ)[grep("208245425_208248422",colnames(GeneSJ))]
## [1] "chr2_208245425_208248422"
Found: chr2_208245425_208248422
Reads of all the AML samples (mutated and no mutated) for the splice junction:
GeneSJ$chr2_208245425_208248422
## [1] 0 0 0 0 0 0 2 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
## [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
## [112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0
## [149] 0 0
Samples with the SJ of interest:
table(GeneSJ$chr2_208245425_208248422>0)
##
## FALSE TRUE
## 143 7
Groups of the samples having the alternative splice junction:
table(GeneSJ$GROUP[GeneSJ$chr2_208245425_208248422 > 0])
##
## MUT WT
## 3 4
Alternative SJ found in the mutated samples.
Search: chr2:208245425-208251429
Show all the splice junctions containing the positions 208245425-208251429
colnames(GeneSJ)[grep("208245425_208251429",colnames(GeneSJ))]
## character(0)
Alternative SJ not found in the splice junction collection.
Exon 3-4: chr2:208248661-208251429
Exon 4-5: chr2:208245425-208248368; splice site chr2:208248368
Reads of all the AML samples (mutated and no mutated) for the splice junction:
GeneSJ$chr2_208248661_208251429
## [1] 106 188 65 275 159 47 122 94 134 100 188 180 224 33 253 181 99 92
## [19] 169 53 83 104 111 44 273 97 128 28 79 262 74 57 62 151 150 122
## [37] 97 152 77 488 371 60 129 184 156 81 131 166 60 70 264 196 114 100
## [55] 245 95 62 114 326 197 213 20 68 13 78 32 111 67 7 55 302 123
## [73] 102 34 209 77 230 64 123 124 123 130 106 193 291 80 177 43 96 95
## [91] 82 226 93 92 130 73 369 132 167 363 53 85 488 265 159 67 433 162
## [109] 251 388 111 91 68 170 95 230 217 141 64 156 181 206 100 95 288 57
## [127] 187 72 201 68 212 185 154 142 95 121 72 113 177 55 192 351 59 177
## [145] 186 110 54 92 51 144
Reads of all the AML samples (mutated and no mutated) for the splice junction:
GeneSJ$chr2_208245425_208248368
## [1] 81 171 42 260 115 30 97 72 120 66 124 147 165 13 243 146 86 58
## [19] 152 37 71 58 77 41 234 40 99 16 45 269 80 51 64 129 146 99
## [37] 87 149 67 480 283 43 93 91 154 63 115 116 45 53 274 172 97 92
## [55] 208 102 53 73 240 181 194 20 55 8 104 32 117 42 6 48 253 108
## [73] 119 37 158 38 217 62 125 112 55 122 104 160 272 54 121 27 69 82
## [91] 60 186 80 53 90 72 312 104 134 360 25 80 430 183 124 57 401 132
## [109] 200 383 103 63 48 143 78 198 190 117 42 141 142 131 62 69 230 36
## [127] 117 72 172 54 222 186 104 141 70 132 69 102 196 33 172 347 37 174
## [145] 122 126 28 133 72 144
Count the reads of all the splice junctions of the gene harboring the variant:
GeneSJ$rowSum_SJtotal <- rowSums(GeneSJ[,grep("chr", names(GeneSJ))])
Normalization of the expression by the total read counts of all the splice junctions of the gene:
GeneSJ$Normalized_CanonEx3_4 <- (GeneSJ$chr2_208248661_208251429)/GeneSJ$rowSum_SJtotal*100
GeneSJ$Normalized_CanonEx4_5 <- (GeneSJ$chr2_208245425_208248368)/GeneSJ$rowSum_SJtotal*100
GeneSJ$Normalized_DGEx4 <- (GeneSJ$chr2_208245425_208248422)/GeneSJ$rowSum_SJtotal*100
Download the normalized values for the assessed splice junctions of all the AML samples:
Mutated samples vaf:
Canonical splice junction: Exon 4-5: chr2:208245425-208248368; donor splice site chr2:208248368
Splicing alterations:
Canonical splice junction: Exon 4-5: chr2:208245425-208248368; donor splice site chr2:208248368
Splicing alteration:
Violin Plots for the alternative splice junctions interrogated:
SJCounts <- GeneSJ
Normality Test:
shapiro.test(SJCounts$Normalized_DGEx4[SJCounts$GROUP == "WT"])
##
## Shapiro-Wilk normality test
##
## data: SJCounts$Normalized_DGEx4[SJCounts$GROUP == "WT"]
## W = 0.13758, p-value < 2.2e-16
Value of Mean Normalized Expression of the Alternative SJ in WT samples:
mean_WT_SJi <- mean(SJCounts$Normalized_DGEx4[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 0.0033961
Normalized Expression Value of the Alternative SJ in the MUT sample:
MUT_SJi <- SJCounts$Normalized_DGEx4[SJCounts$GROUP == "MUT"]
MUT_SJi
## [1] 0.08210181 0.00000000 0.00000000 0.30581040 0.00000000 0.00000000
## [7] 0.00000000 0.00000000 0.26954178 0.00000000 0.00000000
Deviation from the mean normalized expression:
SJCounts$Difference <- SJCounts$Normalized_DGEx4 - mean_WT_SJi
Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression
SJCounts[SJCounts$GROUP == "MUT", c("sample_id", "Difference")]
## sample_id Difference
## 11 TCGA-AB-2901-03A 0.07870571
## 21 TCGA-AB-2822-03A -0.00339610
## 35 TCGA-AB-2863-03A -0.00339610
## 74 TCGA-AB-2867-03A 0.30241430
## 86 TCGA-AB-2919-03A -0.00339610
## 98 TCGA-AB-2992-03A -0.00339610
## 112 TCGA-AB-2990-03B -0.00339610
## 133 TCGA-AB-2821-03A -0.00339610
## 140 TCGA-AB-3011-03A 0.26614568
## 143 TCGA-AB-2928-03A -0.00339610
## 149 TCGA-AB-2977-03B -0.00339610
v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
## x[1:5] = -0.0033961, 0.03808, 0.07112, 0.13785, 0.21143
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))
v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.9856115 0.9712230 0.9712230 1.0000000 0.9712230 0.9712230 0.9712230
## [8] 0.9712230 1.0000000 0.9712230 0.9712230
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_DGEx4")]
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")
MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <-NA
MUT_df$Prediction <- "Donor Gain"
MUT_df$splice_junction_status <- "AlternativeSJ found in MUT samples"
MUT_df$splice_junction_position <- "chr2:208245425-208248422"
MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)
Download the vaf, inferred percentiles and pvalues of the mutated samples:
Kruskal Wallis:
kruskal.test(Normalized_DGEx4 ~ Variant_status, data = SJCounts)
##
## Kruskal-Wallis rank sum test
##
## data: Normalized_DGEx4 by Variant_status
## Kruskal-Wallis chi-squared = 28.911, df = 3, p-value = 2.338e-06
Pairwise comparisons:
pairwise.wilcox.test(SJCounts$Normalized_DGEx4, SJCounts$Variant_status)
##
## Pairwise comparisons using Wilcoxon rank sum test with continuity correction
##
## data: SJCounts$Normalized_DGEx4 and SJCounts$Variant_status
##
## R132C R132G R132S
## R132G 1.00 - -
## R132S 0.80 1.00 -
## WT 0.02 1.00 6.6e-07
##
## P value adjustment method: holm
Detailed:
pkw <- pairwise_wilcox_test(SJCounts,Normalized_DGEx4 ~ Variant_status)
pkw
## # A tibble: 6 × 9
## .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
## 1 Normalized_D… R132C R132G 9 1 5.5 8.04e-1 1 e+0 ns
## 2 Normalized_D… R132C R132S 9 1 1 1.99e-1 7.96e-1 ns
## 3 Normalized_D… R132C WT 9 139 748. 4 e-3 2 e-2 *
## 4 Normalized_D… R132G R132S 1 1 0 1 e+0 1 e+0 ns
## 5 Normalized_D… R132G WT 1 139 67.5 8.98e-1 1 e+0 ns
## 6 Normalized_D… R132S WT 1 139 139 1.09e-7 6.54e-7 ****
Normality Test:
shapiro.test(SJCounts$Normalized_CanonEx4_5[SJCounts$GROUP == "WT"])
##
## Shapiro-Wilk normality test
##
## data: SJCounts$Normalized_CanonEx4_5[SJCounts$GROUP == "WT"]
## W = 0.97019, p-value = 0.003895
Value of Mean Normalized Expression of the Alternative SJ in WT samples:
mean_WT_SJi <- mean(SJCounts$Normalized_CanonEx4_5[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 11.69597
Normalized Expression Value of the Alternative SJ in the MUT sample:
MUT_SJi <- SJCounts$Normalized_CanonEx4_5[SJCounts$GROUP == "MUT"]
MUT_SJi
## [1] 10.180624 12.434326 12.618842 11.314985 8.047690 12.149533 9.183673
## [8] 11.366120 8.894879 8.604651 14.723926
Deviation from the mean normalized expression:
SJCounts$Difference <- SJCounts$Normalized_CanonEx4_5 - mean_WT_SJi
Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression
SJCounts$Difference[SJCounts$GROUP == "MUT"]
## [1] -1.5153475 0.7383542 0.9228703 -0.3809868 -3.6482815 0.4535612
## [7] -2.5122980 -0.3298513 -2.8010928 -3.0913203 3.0279549
v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
## x[1:137] = -6.5809, -5.8401, -5.6354, ..., 4.0065, 4.9974
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))
v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.20143885 0.62589928 0.64748201 0.34532374 0.05035971 0.52517986
## [7] 0.11510791 0.35251799 0.10071942 0.07194245 0.97122302
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_CanonEx4_5")]
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")
MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <- NA
MUT_df$Prediction <- "Donor Loss"
MUT_df$splice_junction_status <- "CanonicalSJ"
MUT_df$splice_junction_position <- "chr2:208245425-208248368"
MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)
Download the vaf, inferred percentiles and pvalues of the mutated samples:
Kruskal Wallis:
kruskal.test(Normalized_CanonEx4_5 ~ Variant_status, data = SJCounts)
##
## Kruskal-Wallis rank sum test
##
## data: Normalized_CanonEx4_5 by Variant_status
## Kruskal-Wallis chi-squared = 4.1238, df = 3, p-value = 0.2484
Pairwise comparisons:
pairwise.wilcox.test(SJCounts$Normalized_CanonEx4_5, SJCounts$Variant_status)
##
## Pairwise comparisons using Wilcoxon rank sum exact test
##
## data: SJCounts$Normalized_CanonEx4_5 and SJCounts$Variant_status
##
## R132C R132G R132S
## R132G 1 - -
## R132S 1 1 -
## WT 1 1 1
##
## P value adjustment method: holm
Detailed:
pkw <- pairwise_wilcox_test(SJCounts,Normalized_CanonEx4_5 ~ Variant_status)
pkw
## # A tibble: 6 × 9
## .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
## 1 Normalized_Canon… R132C R132G 9 1 7 0.6 1 ns
## 2 Normalized_Canon… R132C R132S 9 1 7 0.6 1 ns
## 3 Normalized_Canon… R132C WT 9 139 527 0.432 1 ns
## 4 Normalized_Canon… R132G R132S 1 1 1 1 1 ns
## 5 Normalized_Canon… R132G WT 1 139 16 0.19 1 ns
## 6 Normalized_Canon… R132S WT 1 139 14 0.174 1 ns
Variant found in 2 patients of the TCGA (2 samples).
Patients with CBL chr11,119278165,G,C variant: 1 patient (1 sample)
Patients with CBL chr11,119278164,A,T variant: 1 patient (1 sample)
Patients with the variant and RNASeq for validation: 2 patients (2 samples)
The splicing alterations being assessed are:
Variant information:
Load the extracted splice junctions of the gene harboring the mutation.
extractedSJ_path <- paste0(extractedSJ_dir_in,"CBL_UM_annotSJ.tsv")
GeneSJ <- read.delim(extractedSJ_path, sep ="\t")
Set the sample’s group: Mutated (MUT) or No Mutated (WT)
samples_df <- found_variants[found_variants$Gene=="CBL" & found_variants$MutationKey_Hg38 %in% c("chr11,119278165,G,C","chr11,119278164,A,T"),]
cases <- samples_df$RNA_Sample[samples_df$Validable == "Validable"]
GeneSJ$GROUP <- ifelse(GeneSJ$sample_id %in% cases , "MUT", "WT")
Search for the splice junctions of interest in the extracted splice junctions of the gene by position (chr_SJstart_SJend).
Search: chr11:119277845-119278189
Show all the splice junctions containing the position 119277845-119278189
colnames(GeneSJ)[grep("119277845_119278189",colnames(GeneSJ))]
## [1] "chr11_119277845_119278189"
Found: chr11_119277845_119278189
Reads of all the AML samples (mutated and no mutated) for the splice junction:
GeneSJ$chr11_119277845_119278189
## [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [26] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [51] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0
## [76] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 15
## [101] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [126] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Samples with the SJ of interest:
table(GeneSJ$chr11_119277845_119278189>0)
##
## FALSE TRUE
## 148 2
Groups of the samples having the alternative splice junction:
table(GeneSJ$GROUP[GeneSJ$chr11_119277845_119278189 > 0])
##
## MUT
## 2
Alternative SJ found in the mutated samples.
Search: chr11:119277845-119278237
Show all the splice junctions containing the position 119277845-119278237
colnames(GeneSJ)[grep("119277845_119278237",colnames(GeneSJ))]
## [1] "chr11_119277845_119278237"
Found: chr11_119277845_119278237
Reads of all the AML samples (mutated and no mutated) for the splice junction:
GeneSJ$chr11_119277845_119278237
## [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [26] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [51] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0 0 0
## [76] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 20
## [101] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [126] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Samples with the SJ of interest:
table(GeneSJ$chr11_119277845_119278237>0)
##
## FALSE TRUE
## 148 2
Groups of the samples having the alternative splice junction:
table(GeneSJ$GROUP[GeneSJ$chr11_119277845_119278237 > 0])
##
## MUT
## 2
Alternative SJ found in the mutated samples.
Exon 7-8: chr11:119277845-119278165; aceptor splice site: chr11:119278165
Exon 8-9: chr11:119278298-119278509
Reads of all the AML samples (mutated and no mutated) for the splice junction:
GeneSJ$chr11_119277845_119278165
## [1] 59 78 85 131 67 134 56 98 49 33 101 101 174 43 64 68 78 59
## [19] 62 49 58 122 138 61 91 6 58 44 64 33 106 48 75 34 63 21
## [37] 45 78 59 63 66 66 36 42 95 57 87 142 35 60 17 119 41 49
## [55] 103 48 28 33 98 122 117 10 81 30 44 21 127 59 10 47 98 105
## [73] 129 52 66 12 76 30 39 46 20 104 63 55 18 63 83 53 64 58
## [91] 60 71 45 136 66 46 99 94 56 60 86 40 95 102 76 61 39 63
## [109] 74 102 133 18 44 107 73 95 88 61 14 42 64 60 86 14 167 59
## [127] 162 55 68 32 79 73 113 108 99 63 42 72 111 34 52 119 70 167
## [145] 77 36 117 91 5 19
Reads of all the AML samples (mutated and no mutated) for the splice junction:
GeneSJ$chr11_119278298_119278509
## [1] 78 87 94 136 67 175 49 93 59 51 105 119 146 12 48 85 80 85
## [19] 64 63 46 69 162 48 109 4 74 50 70 45 106 40 85 51 87 21
## [37] 41 89 69 83 99 76 46 41 117 47 95 126 10 56 20 133 46 42
## [55] 112 43 25 30 103 153 129 17 70 22 75 13 123 46 8 34 131 104
## [73] 124 53 96 37 81 28 50 73 20 139 71 52 10 76 79 74 78 61
## [91] 66 103 62 153 70 40 133 97 67 204 58 30 109 151 103 80 45 67
## [109] 95 108 116 20 50 117 72 106 76 97 23 54 92 63 55 22 165 83
## [127] 155 73 70 51 74 74 136 128 114 49 43 94 113 32 34 132 84 184
## [145] 82 45 96 84 9 24
Count the reads of all the splice junctions of the gene harboring the variant:
GeneSJ$rowSum_SJtotal <- rowSums(GeneSJ[,grep("chr", names(GeneSJ))])
Normalization of the expression by the total read counts of all the splice junctions of the gene:
GeneSJ$Normalized_CanonEx7_8 <- (GeneSJ$chr11_119277845_119278165)/GeneSJ$rowSum_SJtotal*100
GeneSJ$Normalized_CanonEx8_9 <- (GeneSJ$chr11_119278298_119278509)/GeneSJ$rowSum_SJtotal*100
GeneSJ$Normalized_Ex8AG_chr11.119277845.119278189 <- (GeneSJ$chr11_119277845_119278189)/GeneSJ$rowSum_SJtotal*100
GeneSJ$Normalized_Ex8AG_chr11.119277845.119278237 <- (GeneSJ$chr11_119277845_119278237)/GeneSJ$rowSum_SJtotal*100
Download the normalized values for the assessed splice junctions of all the AML samples:
Mutated samples vaf:
Canonical splice junction: Exon 7-8: chr11:119277845-119278165; aceptor splice site: chr11:119278165
Splicing alterations:
Canonical splice junction: Exon 7-8: chr11:119277845-119278165; aceptor splice site: chr11:119278165
Splicing alteration:
Violin Plots for the alternative splice junctions interrogated:
SJCounts <- GeneSJ
Value of Mean Normalized Expression of the Alternative SJ in WT samples:
mean_WT_SJi <- mean(SJCounts$Normalized_CanonEx7_8[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 8.955325
Normalized Expression Value of the Alternative SJ in the MUT sample:
MUT_SJi <- SJCounts$Normalized_CanonEx7_8[SJCounts$GROUP == "MUT"]
MUT_SJi
## [1] 4.503582 2.631579
Deviation from the mean normalized expression:
SJCounts$Difference <- SJCounts$Normalized_CanonEx7_8 - mean_WT_SJi
Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression
SJCounts$Difference[SJCounts$GROUP == "MUT"]
## [1] -4.451743 -6.323747
v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
## x[1:145] = -5.2402, -4.1232, -3.8271, ..., 4.1732, 4.3337
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))
v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.006756757 0.000000000
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_CanonEx7_8")]
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")
MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <- MUT_df$ECDF
MUT_df$Prediction <- "Aceptor Loss"
MUT_df$splice_junction_status <- "CanonicalSJ"
MUT_df$splice_junction_position <- "chr11:119277845-119278165"
MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)
MUT_df$Pvalue <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$Pvalue)
Download the vaf, inferred percentiles and pvalues of the mutated samples:
Value of Mean Normalized Expression of the Alternative SJ in WT samples:
mean_WT_SJi <- mean(SJCounts$Normalized_Ex8AG_chr11.119277845.119278189[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 0
Normalized Expression Value of the Alternative SJ in the MUT sample:
MUT_SJi <- SJCounts$Normalized_Ex8AG_chr11.119277845.119278189[SJCounts$GROUP == "MUT"]
MUT_SJi
## [1] 0.5117707 0.6578947
Deviation from the mean normalized expression:
SJCounts$Difference <- SJCounts$Normalized_Ex8AG_chr11.119277845.119278189 - mean_WT_SJi
Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression
SJCounts$Difference[SJCounts$GROUP == "MUT"]
## [1] 0.5117707 0.6578947
v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
## x[1:1] = 0
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))
v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 1 1
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_Ex8AG_chr11.119277845.119278189")]
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")
MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <- 1- MUT_df$ECDF
MUT_df$Prediction <- "Aceptor Gain"
MUT_df$splice_junction_status <- "AlternativeSJ found in MUT samples"
MUT_df$splice_junction_position <- "chr11:119277845-119278189"
MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)
MUT_df$Pvalue <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$Pvalue)
Download the vaf, inferred percentiles and pvalues of the mutated samples:
Value of Mean Normalized Expression of the Alternative SJ in WT samples:
mean_WT_SJi <- mean(SJCounts$Normalized_Ex8AG_chr11.119277845.119278237[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 0
Normalized Expression Value of the Alternative SJ in the MUT sample:
MUT_SJi <- SJCounts$Normalized_Ex8AG_chr11.119277845.119278237[SJCounts$GROUP == "MUT"]
MUT_SJi
## [1] 0.8188332 0.8771930
Deviation from the mean normalized expression:
SJCounts$Difference <- SJCounts$Normalized_Ex8AG_chr11.119277845.119278237 - mean_WT_SJi
Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression
SJCounts$Difference[SJCounts$GROUP == "MUT"]
## [1] 0.8188332 0.8771930
v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
## x[1:1] = 0
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))
v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 1 1
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_Ex8AG_chr11.119277845.119278237")]
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")
MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <- 1- MUT_df$ECDF
MUT_df$Prediction <- "Aceptor Gain"
MUT_df$splice_junction_status <- "AlternativeSJ found in MUT samples"
MUT_df$splice_junction_position <- "chr11:119277845-119278237"
MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)
MUT_df$Pvalue <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$Pvalue)
Download the vaf, inferred percentiles and pvalues of the mutated samples:
Download the vaf, inferred percentiles and pvalues of all the splicing alterations evaluated: